在许多工程应用中,例如雷达/声纳/超声成像等许多工程应用中,稀疏多通道盲卷(S-MBD)的问题经常出现。为了降低其计算和实施成本,我们提出了一种压缩方法,该方法可以及时从更少的测量值中进行盲目恢复。提出的压缩通过过滤器随后进行亚采样来测量信号,从而大大降低了实施成本。我们得出理论保证,可从压缩测量中识别和回收稀疏过滤器。我们的结果允许设计广泛的压缩过滤器。然后,我们提出了一个由数据驱动的展开的学习框架,以学习压缩过滤器并解决S-MBD问题。编码器是一个经常性的推理网络,该网络将压缩测量结果映射到稀疏过滤器的估计值中。我们证明,与基于优化的方法相比,我们展开的学习方法对源形状的选择更为强大,并且具有更好的恢复性能。最后,在具有有限数据的应用程序(少数图)的应用中,我们强调了与传统深度学习相比,展开学习的卓越概括能力。
translated by 谷歌翻译
近年来,基于深度学习的语言增强表现出前所未有的性能。最受欢迎的单声道语音增强框架是端到端网络将嘈杂的混合物映射到清洁语音的估计。随着计算能力的增长和多通道麦克风录制的可用性,目前的作用旨在将空间统计信息与光谱信息一起融合以提高性能。尽管Mono输出的增强性能提高,但空间图像保存和主观评估在文献中没有大量关注。本文提出了一种用于语音增强的新颖立体感知框架,即,基于深度学习的语音增强的训练损失,以在增强立体声混合物的同时保留空间图像。所提出的框架是独立的模型,因此它可以应用于任何基于深度学习的架构。我们通过聆听测试提供对训练有素的模型的广泛目标和主观评估。我们表明,通过规范进行图像保存损失,整体性能得到改善,并且演讲的立体方面更好地保存。
translated by 谷歌翻译
将数据作为几个原子的组合表示数据的字典学习问题,长期以来作为一种流行的学习统计信息和信号处理方法。最受欢迎的字典学习算法在稀疏编码和词典上的交替交替,富有的文献研究了其理论融合。神经卓越的展开稀疏编码网络的日益普及导致了经验发现,通过这种网络的反向化执行字典学习。本文通过Pudle提供了这些经验结果的第一个理论证明,可提供展开的展开字典学习方法。我们突出了损失,展开和背交对融合的影响。我们发现隐式加速:作为展开的函数,BackPropagated梯度会收敛得更快,比梯度从交替最小化更准确。我们通过合成和图像去噪实验补充我们的研究结果。调查结果支持使用加速深度学习优化器和展开网络用于字典学习。
translated by 谷歌翻译
卷积字典学习(CDL),估计来自数据的移位不变模板的问题,通常在模板上的先前/结构的情况下进行。在数据稀缺或低信噪比(SNR)制度中,学习模板会过度提供数据并缺乏平滑,这可能影响下游任务的预测性能。为了解决此限制,我们提出了GPCDL,一个卷积字典学习框架,该卷积字典学习框架在使用高斯过程(GPS)上强制对模板上的前提。随着对光滑度的重点,理论上,施加GP的理论上是等同于维纳滤波学习模板的维纳,从而抑制了高频分量并促进了平滑度。我们表明该算法是经典迭代重新重量最小二乘算法的简单扩展,与GP内核的选择无关。此属性允许有一个以不同的平滑度假设灵活实验。通过仿真,我们表明GPCDL学习顺利的词典,比在一系列SNR中的不断传扰的替代方案更好的准确性。通过应用于神经尖峰数据,我们表明GPCDL与非正规化CDL相比,GPCDL了解更准确和视觉可解释的顺利字典,导致卓越的预测性能,以及参数化。
translated by 谷歌翻译
Discriminative features extracted from the sparse coding model have been shown to perform well for classification. Recent deep learning architectures have further improved reconstruction in inverse problems by considering new dense priors learned from data. We propose a novel dense and sparse coding model that integrates both representation capability and discriminative features. The model studies the problem of recovering a dense vector $\mathbf{x}$ and a sparse vector $\mathbf{u}$ given measurements of the form $\mathbf{y} = \mathbf{A}\mathbf{x}+\mathbf{B}\mathbf{u}$. Our first analysis proposes a geometric condition based on the minimal angle between spanning subspaces corresponding to the matrices $\mathbf{A}$ and $\mathbf{B}$ that guarantees unique solution to the model. The second analysis shows that, under mild assumptions, a convex program recovers the dense and sparse components. We validate the effectiveness of the model on simulated data and propose a dense and sparse autoencoder (DenSaE) tailored to learning the dictionaries from the dense and sparse model. We demonstrate that (i) DenSaE denoises natural images better than architectures derived from the sparse coding model ($\mathbf{B}\mathbf{u}$), (ii) in the presence of noise, training the biases in the latter amounts to implicitly learning the $\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{u}$ model, (iii) $\mathbf{A}$ and $\mathbf{B}$ capture low- and high-frequency contents, respectively, and (iv) compared to the sparse coding model, DenSaE offers a balance between discriminative power and representation.
translated by 谷歌翻译
We outline our work on evaluating robots that assist older adults by engaging with them through multiple modalities that include physical interaction. Our thesis is that to increase the effectiveness of assistive robots: 1) robots need to understand and effect multimodal actions, 2) robots should not only react to the human, they need to take the initiative and lead the task when it is necessary. We start by briefly introducing our proposed framework for multimodal interaction and then describe two different experiments with the actual robots. In the first experiment, a Baxter robot helps a human find and locate an object using the Multimodal Interaction Manager (MIM) framework. In the second experiment, a NAO robot is used in the same task, however, the roles of the robot and the human are reversed. We discuss the evaluation methods that were used in these experiments, including different metrics employed to characterize the performance of the robot in each case. We conclude by providing our perspective on the challenges and opportunities for the evaluation of assistive robots for older adults in realistic settings.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译